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Abstract 

We present a proof for the quantum channel coding theorem which relies 
on the fact that a randomly chosen code space typically is highly suitable for 
quantum error correction. In this sense, the proof is close to Shannon's original 
treatment of information transmission via a noisy classical channel. 

1 Preliminaries 
1.1 Quantum channel 

In the theory of information transmission the information is ascribed to the configura- 
tion of a physical system, and the transmission is ascribed to the dynamical evolution 
of that configuration under the influence of an in general noisy environment. It is 
therefore customary to characterize an information carrying system solely by its con- 
figuration space, and to consider its intrinsic dynamics as part of the transmission. 

In a quantum setting we identify a system Q with its Hilbert space, denoted by 
the same symbol Q. Its dimension \Q\ will be always assumed to be finite. The 
system's configuration is a quantum state described by a density operator p in B(Q), 
the set of bounded operators on Q. 

The process of information transmission can be any dynamics of an open quantum 
system Q according to which an initial input state p evolves to a final output state 
p', defining in this way the operation of a quantum channel Mathematically, 
M is a completely positive mapping of B(Q) onto itself, or, when we admit that the 
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system may change to an other system Q' during the course of transmission, onto 
B(Q'), the set of bounded operators on Q', 

N : B{Q) -» B{Q') 

p >-> P = M{p) ■ 

According to Stinespring's theorem [3] the operation of the channel can be always 
understood as an isometric transformation followed by a restriction [H [2|. That 
is, one always finds an ancilla system E with \E\ > 1 and an isometric operation 
V : Q — > Q'E such that for all states p 

N{p) = ti E VpV ] , 

where tr^ denotes the partial trace over E. In the following we refer to this con- 
struction as Stinespring representation. An elementary physical interpretation of it 
becomes obvious in the case Q = Q' . Here one can find a unitary operator U on 
QE and a state vector \<pe) £ E such that V\ip) = U\ip) tg> \<pe) for all state vectors 
G Q. Interpreting U as time evolution operator of the joint system QE, an initial 
state p <8> <Pe, where ifE = \ ( Pe)( < Pe\, will evolve to final state Up <S> ¥eU^ ■ Its partial 
trace with respect to E yields indeed M(p) as the reduced density operator for Q, 

tv E Up ® <p E U ] = ti E VpV ] = M{p) . 

If we fix an orthonormal basis |1), . . . |JV) of E H, the Stinespring representation 
can be rewritten more explicitly in an operator sum as 

N 

M{p) = & k pA^ , 

k=l 

where Kraus operators A±, . . . ,Ajy : Q —>■ Q' are defined by A\ip) := {k\V\ij)) [U HJ 
[2]. Because V is an isometry the Kraus operators satisfy the completeness relation 
ELiAjA k = l Q . 

Below, we will often have to refer to the number of Kraus operators of a channel 
N in a certain operator-sum representation, which, of course, equals the dimension 
\E\ of the ancilla E in the corresponding Stinespring representation. It is therefore 
convenient to define the length |jV| of a channel M by the minimum number of 
Kraus operators in an operator-sum representation, or, equivalently, as the minimum 
dimension of an ancilla in a Stinespring representation needed to represent M. 

According to the above definition a quantum channel maps density operators to 
density operators, and therefore should be trace-preserving. As a matter of fact, it 
is sometimes advantageous to be less restrictive and to consider also trace-decreasing 

2 Since we assumed the dimensions \Q\ and \Q'\ to be finite also the ancilla E can be chosen to be 
of finite dimension \E\ = N. 
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channels. Being still a completely positive mapping, an in general trace-decreasing 
channel N : B(Q) — * B(Q') has a Stinespring representation with an operator V : 
Q -► Q'£ satisfying VW < 1q. As a consequence, corresponding Kraus operators 
A±, . . . An of J\f may be incomplete, meaning that J2k=i A^A < 1q. Physically, a 
trace-decreasing channel describes a transmission that involves either some selective 
process or some leakage, as an effect of which a system does not necessarily reach its 
destination. This motivates us to denote tiM(p) as the transmission probability of 
state p with respect to J\f . 

1.2 Fidelities 

A frequently used quantity for measuring the distance of general quantum states is 
the fidelity [5j El Q] 

F{p,cr) :=|| v/pV^lllr , 

where || . . . \\t r denotes the trace norm, H^||i r = WIu. if one of the states is pure, 
say p = ip = \ip)(ip\, this reduces to 

F(^,a) = (VHV) • 

Generally, < F(p, a) < 1, and F(p, a) = 1 if and only if p = a. The fidelity of two 
states is related to their trace norm distance by pQ 

1- II P ~ & Wtr < F(p, a) < l--\\p-a \\l r . 

Furthermore, the fidelity is monotonic under quantum operations in the sense that 
for any trace-preserving completely positive E : B(Q) — > B(Q'), 

F(p,<r)<F(£(p),S(a)). 

A remarkably theorem by Uhlmann [5j states that the fidelity of p and a can be 
also understood as the maximum transmission probability K'i/'lv)! 2 of purifications ip 
and (p for p and a, respectively. The fidelity F(p, a) thus tells us how close two pure 
states i/j and ip of a universe can be if they are known to reduce to states p and a on a 
subsystem Q. More precisely, the theorem states that if i^rq in RQ is a purification 
of p, and if a can be also purified on RQ, then 

F(p,a) := max \(iPrq\(Prq)\ 2 , 

where the maximum is taken over all purifications <prq of a in RQ [6\ . 

To determine how well a state p is preserved under a channel £ : B(Q) — > B(Q') 
we will generally use the entanglement fidelity [7] 

F e (p,£) := ^rq\1r®£(^rqMrq) , (1) 
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where tpRQ is any purification of p on Q extended by an ancilla system R, and Zr 
is the identity operation on R. In terms of Kraus operators A\, . . . ,Ai£i of £ the 
entanglement fidelity can be expressed as [7] 



F e (p,£) = J2\trpA, 



k\ 2 



k=l 



The entanglement fidelity of a state p = Y^i Pi^Pi is known to be a lower bound of the 
averaged fidelities F(^i,£(ipi)) p], 

F e (p,£) < Y^PiF^e^i)) . 

i 

This relation becomes particularly useful if p is chosen to be the normalized projection 
ttc on a subspace C of Q, ttq = Hc/\C\. Then the entanglement fidelity yields a lower 
bound of the average subspace fidelity, 

F e (n c ,£) < f d^F{ip,£{^))=:F av {C,£). (3) 

where the integral is taken with respect to the normalized, unitarily invariant measure 
on C. Actually, there also exists a strict relation between the two fidelities [HI [9], 

\C\F e (7r c ,£) + l 
F av (C,£)- . 

We emphasize that with Eq. ([T]) also the entanglement fidelity with respect to 
a trace-decreasing channel £ is defined. In this case representation ([2]) turns out to 
hold as well, leading to the following simple but nevertheless useful observation. Let 
a channel £ : B(Q) — > B(Q') be defined by Kraus operators A\, . . . , Ai g \. We call 
a second channel £ : B(Q) — > B(Q') a reduction of £ if it can be represented by a 
subset of the Kraus operators A\, . . . , Ai g \, i.e. 

£(p) = Y_AkpA k \ iVc{l,...,|£|}. 

fcGiV 

By Eq. ([2]) we notice that reducing a channel can never increase entanglement fidelity: 
for any reduction £ of a channel £ 

F(p,£) < F(p,£). (4) 



2 Quantum coding theorem 

2.1 Quantum capacity of a quantum channel 

For the purpose of quantum-information transmission, Alice (sender) and Bob (re- 
ceiver) may employ a quantum channel N : B(Q) — * B(Q') that conveys an input 
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quantum system Q from Alice to an in general different output system Q' received 
by Bob. 

In the simplest case, Alice may prepare quantum information in form of some 
state p of Q, which after transmission via the channel becomes a state p' = ftf(p) 
of Q' received by Bob. In order to obtain Alice's originally sent state p, Bob may 
subject p' to suited physical manipulations, which eventually should result in a state 
p" of Q close to p. Mathematically, this corresponds to the application of a trace- 
preserving, completely positive mapping 1Z : B(Q') — > B(Q), which we denote as 
recovery operation in the following. Referring to Sec. 11.21 relation ([3]), the overall 
performance of this elementary transmission scheme can be conveniently assessed by 
the entanglement fidelity F e (-K,lZ o AT) of the homogeneous density ir = 1q/\Q\ of 
Q with respect to 1Z o Af, or, if we suppose that Bob has optimized the recovery 
operation 1Z, by the maximized entanglement fidelity 

maxF e (7r,7£oAf) . (5) 
R 

To improve the transmission scheme, Alice and Bob may agree upon using only 
states p whose supports lie in a certain linear subspace C of qH- A subspace used for 
this purpose is called a (quantum) code. Its size k is defined as k = log 2 \ C\, meaning 
that a pure state in C carries k qubits of quantum information [12J. Corresponding 
to dH), an appropriate quantity for assessing the suitability of a code C for a channel 
Af is the quantity 

F e (C,Af) := maxF e (7r c ,^oAA) , 

where ttq = Hc/\C\ is the normalized projection on C (again cf. Sec. II. 2ft . We refer 
to this quantity as the entanglement fidelity of the code C with respect to the channel 
Af. 

The definition involves a non-trivial optimization of the recovery operation 1Z. 
At first sight, this makes the code entanglement fidelity rather difficult to determine 
and therefore may cast doubts on its usefulness. However, following Schumacher 
and Westmoreland [13] we will derive a useful explicit lower bound for F e (C,Af) in 
Sec. (jU). 

In the elementary transmission scheme considered so far the quantum information 
is encoded in single quantum systems Q and transmitted in single uses ("shots") of 
the channel Af. Like in classical communication schemes the restriction to single-shot 
uses of the channel is very often far from being optimal. Since the work of Shannon 
|14j it is known that encoding and transmission of information in large blocks yields 
much better results. 

3 This can be advantageous when the interaction of system and environment does affect states in 
C significantly less than the average state, for instance, because C obeys certain symmetries of the 
system-environment interaction Hamiltonian. Moreover, the restriction to a suited subspace C may 
allow Bob to employ quantum error-correcting schemes in the recovery operation 1Z [1UI 111] . 
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In an n-block transmission scheme, Alice uses n identical copies of the quantum 
system Q, in which she encodes quantum information as a state p with support 
in a chosen code C n C Q n . During the transmission each individual system Q is 
independently transformed by the channel M, and Bob receives the state J\f® n (p), on 
which he applies a recovery operation lZ n : B{Q n ) — > B(Q' n ). The crucial differences 
to a single-shot scheme are the usage of a code C n and a recovery operation lZ n which 
in general will not obey the tensor product structure, i.e. C n ^ Cf n and lZ n / IZf n . 
The rate R = ± log 2 \C n \ of an n-block code C n C Q n denotes the average number of 
qubits encoded per system Q and sent per channel use. 

In the end, we wish to know up to which rate the channel N can reliably transmit 
quantum information when an optimal block code C n of arbitrarily large block number 
n is used. This rate defines the quantum capacity Q(N) of the channel M [15j[l6j[T7] 
(for a recent review see e.g. [IE]). A mathematically precise definition uses the notion 
of an achievable rate. A rate R is called achievable by the channel M if there is a 
sequence of codes C ri C Q n , n = 1, 2, . . ., such that 

lim sup lQg2 |Cw| >R, and lim F e (C n ,Af® n ) = 1 . 

n— >oo ti n— >oo 

The supremum of all achievable rates of a channel J\f is the quantum capacity Q{M) 
of the channel N . 



2.2 Quantum coding theorem 

Determining the quantum capacity of a channel M poses one of the central problems 
of quantum information theory. It is partially solved by the quantum coding theorem 
|151 [T6l [T7] which relates quantum capacity to coherent information [19] , the quan- 
tum analogue to mutual information in classical information theory. The coherent 
information is defined for a state p with respect to a trace-preserving channel N as 

I(p,M) = S(M( P )) - S e ( P ,M) . 

This is the von Neuman entropy of the channel output, S(M(p)), minus the entropy 
exchange S e (p,Af) between system and environment, which is given by 

S e (p,AT) = S(l R ®Af(i; RQ )), 

where ipRQ is a purification of p, and Ir is the identity operation on the ancilla system 

Rm- 

The quantum noisy coding theorem states that the quantum capacity Q{M) of a 
channel N is the regularized coherent information I r (jV) of M, 

Q(Af) = IJAT) := lim - max/(p, AA® n ) . 

n — >oo fi p 
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The limiting procedure corresponds to the one in the definition of an achievable rate 
and thus contributes to the fact that generally optimal coding can be only asymptot- 
ically reached in the limit of block numbers n — > oo. As a consequence of this limit 
the regularized coherent information and thus the quantum capacity of a channel is 
still difficult to determine. 

The regularized coherent information has long been known an upper bound for 
Q(M), which is the content of the converse coding theorem [16^ll7j. The direct coding 
theorem, stating that I r (J\f) is actually attainable, has been strictly proven first by 
Devetak [20] . His proof utilizes a correspondence of classical private information and 
quantum information. 

Sections [H [21 U\ and M below represent the four stages of a different proof for the 
direct quantum coding theorem, of which an earlier version appeared in Ref. |21j . 
The working hypothesis underlying this proof is that randomly chosen block codes 
of sufficiently large block number typically allow for almost perfect quantum error 
correction. In this respect, the present proof as well as the one of Hayden et al. 
[22j and also the earlier approaches of Shor [23J and Lloyd 15J follow Shannon's 
original treatment [13] of the classical coding problem. 

3 Outline of proof 

In the first stage of the the proof (Sec. H]) we establish a lower bound for the code 
entanglement fidelity. It is essentially an earlier result of Schumacher and Westmore- 
land [13], of which has been also made good use of recently by Abeysinghe et al. [23] 
and Hayden et al. \22\ in the same context. The bound can be explicitly determined 
in terms of Kraus operators of the channel M, and its use will relieve us from the 
burden of optimizing a recovery operation 1Z for a given code C and channel N in the 
course of proving the coding theorem. In deriving the lower bound the optimization 
of 1Z is solved by means of Uhlmann's theorem. 

In the next stage (Sec. [5]) we investigate the error correcting ability of codes 
that are chosen at random from a unitarily invariant ensemble of codes with a given 
dimension K . Taking the average of the lower bound derived in Sec. |4] we will show 
the averaged code entanglement fidelity of a channel J\f : B{Q) — > B(Q') to obey 

[F e (C,M)] K > trAA(vr) - ^K\N\ \\N(tt)\\ f , (6) 

where tt = 1q/\Q\, and || . . . \\p denotes the Frobenius norm or two norm. 

In Sec. [6] we will illustrate the efficiency of random coding by means of the special 
case of a unital channel U : B{Q) — ► B(Q'), which by definition satisfies U(tt) = tt' . 
In this case the lower bound © immediately proves the attainability of the quantum 
Hamming bound by random coding, and thus provides evidence for the validity of 
the above mentioned working hypothesis. Moreover, if we demand the channel Li to 
be also uniform, as will be defined in Sec. [6j we can easily establish the coherent 
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information I(ir,U) to be a lower bound of the quantum capacity Q(U), 

Q(U) > I(ir,U) ■ 

The third stage of the proof (Sec. [7|) is merely the generalization of this relation 
to an arbitrary channel N : B(Q) — > B{Q'). To this end we have to consider n-block 
transmission schemes. For large n it is possible to arrange for unitality and unifor- 
mity of ]\[® n in an approximate sense by, as it will turn out, only minor modifications 
of J\f® n . Approximate uniformity is achieved by reducing the operation J\[® n to an 
operation A4,n consisting only of typical Kraus operators. Furthermore, letting M e ,n 
follow a projection on the typical subspace of M(tt) in Q' n establishes an approxi- 
matively uniform and unital channel Jv E n , which nevertheless is close to the original 
J\f® n . In the end, this suffices to prove Q(M) > I(7r,AA) for a general channel N . A 
corollary is that for any subspace V C Q with normalized projection ny = IIy/|V| 

Q(M) > I(tt v ,M) . 

Finally, in Sec. [8] we employ a lemma of Bennett, Shor, Smolin, and Thapliyal 
(BSST) [25J in order to deduce from the last relation 

m 

for an arbitrary integer m, and any density p of Q n . This shows the regularized 
coherent information to be a lower bound of Q(M) and thus concludes the proof of 
the direct coding theorem. 

4 A lower bound for the code entanglement fidelity 

Let a (possibly trace-decreasing) quantum channel M : B(Q) — > B(Q') have a Stine- 
spring representation with an operator V : Q — > Q'E, and let C C Q be a code whose 
normalized projection ttq = Hc/\C\ may have a purification ipRQ on RQ, with R 
being an appropriate ancilla system. Following Schumacher and Westmoreland we 
will establish 

F e (C, AT) > p - p || p' RE -pr® p' E \\ tT , (7) 

where p = trM(irc), Pr = trg i/irq, and the states p' RE and p' E are reduced density 
operators of the final normalized pure state 

^ rq , e = ^{1r®V)^ rq (1r®V^) , (8) 

Pre = tT Q' tpRQ'E , p'e = tv RQ' ^RQ'E ■ 

Furthermore, we will show show that the lower bound ([7]) can alternatively be for- 
mulated in terms of Kraus operators A\, . . . , An of A^ as 

F e (C,M) > P~ \\D\\tr, (9) 



S 



where 

N 

D = \°\ (ncAMjirc - tiiircAMjivc) ir c ) ® , (10) 

ij=l 

with |1), . . . , \N) being orthonormal states of some ancilla system. 

Proof of relation ([?[).■ We recall that the code entanglement fidelity involves a 
non-trivial optimization procedure of a recovery operation 1Z (cf. Sec. I2.ip . The idea 
is to hand over this job to Uhlmann's theorem. To this end we consider the pure 
state 

4> ■= ^RQ ® i>RQ>E 

of the joint system RSQ'E, where S denotes a copy of QR. Obviously, ip is a 
purification of the state pr <8> p' E with respect to the ancilla SQ' . Next, we extend 
ip' R Q/ E by the operation 

£ : B(Q') B(SQ') , p^ip s ®P, 
where ips is any fixed pure state of S, to a pure state 

of RSQ'E. ip' is a purification of p' RE with respect to SQ' , since 

trsQ'^' = trg/trsV' = trQ/^Q'E = Pre ■ 

Now, let another purification ip of in RSQ'E maximize the transition amplitude 

to ip, 

M<p)\ 2 = max ; |(^|x)| 2 . 

\ purification of Pr E 

According to Uhlmann's theorem (cf. Sec. II .2|) we know that 

\$W)? = f{pr®p'e,pre)- (ii) 

Then, an optimal recovery operation 1Z : B(Q') — > S(Q) can b e constructed my means 
of a unitary operation U$q> on SQ' that rotates the actual (extended) final state ip' 
to the maximizing state (p, 

ip = (l R ® U SQ > ® 1e)iP'(1r ® U SQ ' ] <8> 1b) • 
Keeping in mind that S 1 = Q-R we define 

^(PQ') : = ^RQ'U S Q'£{pQ')UsQ> ] , 
and realize that for the state p R Qi = ^e^'rqie 

Xr ® K{p' RQI ) = tiRQ, (1 R ® U SQ ,)I R ® £(p' rq ,)(1r U SQ J) 
= ^rq'e (lie ® ^5Q' ® 1b)V^(!b ® Usq'^ 1e) 

= tTRQ'E V , 
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where here and in the following the partial trace over R refers to the second R 
appearing in the product Hilbert space RSQ'E = RQRQ'E. Since further 

IpRQ = tr R Q'E'>P , 

we conclude 

F e {Tr c ,noN) > pF^ RQ ,I R ®K(p RQ ,)) 

= p FfaRQ'E 1p, ^RQ'E <p) 

> pM<p)\ 2 , 

where the second inequality is due to the monotonicity of the fidelity under partial 
trace. With Eq. (jlip and the general relation F(p,o~) > 1— \\ p — a \\t r this proves 
relation ([?]). 

Proof of relation We choose a purification i/j R q of nc with a state vector 

I^Q = ^=£|cf>|c?>, 
V K i=i 

where K = \C\, and |cf ), . . . |c^) and |c^ }, . . . \cj C ) denote orthonormal vectors that 
span R and C, respectively. Supposing that the orthonormal states |1), . . . , \N) span 
the ancilla E and the Kraus operators Ai,...Ajf are associated to V by Ai\ip)Q = 
(i\V\ipQ), we immediately obtain from Eq. (|HJ) 

K N 

P^RQ'E = ~ E E l C fX C ml ® A\<?Y&\A/ ® K)0'l • 



lm=l ij=l 



Hence 



-j A' AT 

PPflE = ^ E E (&\AMi\<?) ><C^| ® 
lm=l ij=l 
i K K N 

ppr®p' e = F E \<%)(<%\®Y,Y,( c ?\ A Mi\c?) im . 

m=l 1=1 ij=l 

The trace norm of p(p' R E — Pr® p'e) appearing in the lower bound ([7]) becomes more 
handy if we transform the operator difference by an isometry J : B(RE) — > B(QE), 

J ■ E ^rn,y| C fX C ml ® K)0'l ^ E «*m,iilq Q )(c£| ® 1001 • 
lm,ij lm,ij 

J shifts from R to Q and then complex conjugates with respect to the basis \c^) <S> \i), 
which clearly leaves the trace norm invariant. A straightforward calculation then 
shows 

N 

D := pJ(p' RE - p R ® p' E ) = K (ircAMjTrc - ti^cAf Ajirc) ) ® , 

ij = l 
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as in Eq. (jlOp . and further 

F e (C,J\f) > p -pWp're- PR® p' E \\tr= P~ \\J(p're- PR® /4)lk= P~ \\ D \\tr ■ 
which is what we wanted to proof. 

5 Random coding 

Let the unitarily invariant code ensemble of all i^-dimensional codes C C Q be defined 
by the ensemble average 

[A{C)] K := [ dn(U) A(UC ) (12) 

of a code dependent variable A(C). Here, Co is some fixed if-dimensional code space 
in Q, and p, is the normalized Haar measure on JJ(Q), the group of all unitaries on 
Q. Below we will show that the ensemble averaged code entanglement fidelity of a 
(possibly trace-decreasing) channel M : B{Q) — » B{Q') obeys 

[F e (C, N)] K > trAA(vr) - ^Jk\M\ \\N(tt) \\f , (13) 

where tt = 1q/\Q\ is the uniform density on Q. 

We begin with the ensemble average of relation Q, 

[F e (C, M)\ K > [tTtf(n c )] K ~ [\\D\\tr] K , (14) 
where, as always, ttq = Hc/\C\, and 

N 

D = K (ncAMjirc - tiiwcAMjTTc) ttc ) ® , (15) 

with A\, . . . ,An being N = \M\ Kraus operators of a minimal operator-sum repre- 
sentation of M. To average tiM(irc) we realize that p i— ► trM(p) as a linear operation 
interchanges with the average. Since [vrc]^ = tt we thus obtain 

[tvM(7rc)] K = tvM{[TT C ] K ) = trAA(vr) . 

Directly averaging the trace norm of D turns out to be quite cumbersome. Therefore, 
we first estimate 

[\\D\\ tr ] 2 K < KN[\\D\\ F ] 2 K < KN 

where || D \\p = (tr D^D) 1 ^ 2 denotes the Frobenius norm (two-norm) of D. The first 
inequality follows from the general relation [| A ||tr< Vd \\ A \\p, where d is the rank 
of A, and the second inequality is Jensen's inequality. This leads us to 

[F e (C, Af)] K > trAT(Tr) - sJlCN [\\D\\ f ] k , (16) 
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and it remains to determine the ensemble average of || D ||^. From the explicit 
representation Eq. (fT5|) follows 



N 



ij=l 



| D \\%= tr D*D = ]T ti(ncWi^ncWij) - -=|tr -IT, , , . 



where operators W%j are 

It is useful to introduce a Hermitian form 
b{V,W) : 

with which 



tr (■kcV^ttoW) - — tvi-KcV 1 ) tT(ir c W) 
K 



K 



N 



\Df F 



K 



J2HWij,Wij). 



(17) 



(18) 



ij=l 



The point is that the unitary invariance of the ensemble average entails the unitary 
invariance of b, i.e., for any U £ U(Q) 

b(V,W) = b{UVU\UWU ] ) . 

which, in fact, already determines b to a large extend: According to Weyl's theory 
of group invariants |26t [27] b(V, W) must be a linear combination of the only two 
fundamental unitarily invariant Hermitian forms tr V*W and tr tr W, 



b(V, W) = a tr V ] W + /3 tr tr W . 



(19) 



An elementary proof of this fact is outlined in Appendix [A] To determine the co- 
efficients a and (3 we consider two special choices of the operators V and W. For 
V = W = 1 Q Eqs. CE7J) and (P2J yield 



aM + (3M 2 = — , 
K 



(20) 



where here and henceforth M = \Q\. Secondly, when we set V and W to a projection 
ip = on Q we obtain from Eq. (JTT ~ 



KM) 



K - 1 r 



7Tc 



K 



Reverting to random matrix theory we find in Appendix ([B]) 
1/K)/(M 2 + M), and hence 



■Kc 



! ]k = (i + 



6(^,V) 



1 — K~ 2 
M 2 + M 
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With b(tp, ip) = a + from Eq. (p79|) this yields the second equation, 

a + (3 



(21) 



M 2 + M ' 

Solving Eqs. (|2U|) and (|21|) for a and 0, and inserting the solution into (|19|) produces 



-2 



b(V, W) 



l-K 
M 2 — 1 



and, by Eq. JT8J, 



I nil 2 



K ~ M 2 - 



trV^W - —trVHrW) , 



- y [tvWJWa - — Itr 



(22) 



In general, not much is given away when we use the upper bound for [|| D \\p] K that 
we obtain by using (1 — 1/K 2 )/(M 2 — 1) < 1/M 2 and by omitting the negative terms 
-|tr Wij\ 2 /M in the sum. Then 



I-DI 



K - M 2 



M 



where we cyclically permuted operators under the trace to obtain the last equality. 
We realize that the argument of the trace is simply M(ir) 2 (with it = 1q/M). This 
yields the rather simple upper bound 



l^lll 



A" 



<\W(*)\\ 2 F, 



(23) 



which finally proves the lower bound (|13p by relation ()16l) . 



6 Unital and uniform channels 

The efficiency of random coding can be easily demonstrated by relation (|13p for the 
case of a unital channel IA : B(Q) — > B(Q'), which by definition maps the homoge- 
neously distributed input state tt to the homogeneously distributed output state it'. 
An example is a random unitary channel U r : B{Q) — > B{Q) , p i— ► J2iPiUipUj , where 
arbitrary unitary operators U\,..., Un are applied with probabilities Pi,---,pn on 
the system Q. 

Thus, for a unital channel || U(ir) \\f=\\ ^' 

\\ F = \Q'\- 1 / 2 , which by relation ([13]) 
predicts the average entanglement fidelity of iT-dimensional codes to obey 

[F e {CM)]K > 



13 



This means that almost all codes of dimension K allow for almost perfect correction 
of the unital noise U, provided that 

K\U\ <C \Q'\ . 

Recalling that \IA\ is the number of Kraus operators in an operator-sum representation 
of IA, this relation clearly shows the attainability of the quantum Hamming bound 
[28j by random coding. Formally, this is equivalent to the lower bound 

Q(U) > log 2 |Q'| -log 2 |W| (24) 

of the quantum information capacity of IA. To see this, we consider the n-times 
replicated noise U® n , and study the averaged entanglement fidelity of codes with 
dimension K n = \2 nR \ for some positive rate R. Since with IA also U® n unital, and 
\U® n \ = \lA\ n , this time we arrive at 

[F e (C,U® n )]K n > l -(^) n ' 2 ■ 

For n — > oo the right hand side converges to unity if R < log 2 \ Q'\ — log 2 \IA\. Hence, 
all rates below log 2 \ Q'\ — log 2 \IA\ are achievable, which by the definition of quantum 
capacity (cf. Sec. I2.1j) shows relation (|24l) . 

Finally, let us assume that the channel IA is also uniform, meaning that IA has 
a minimal operator-sum representation with Kraus operators A\,...,A\u\ obeying 
txA^Aj = for i ^ j and j^ti A^Ai = const. = \IA\~ 1 . The first condition is actually 
no restriction, because a non-diagonal representation can always be transformed to 
a diagonal on^| . The second condition demands that errors E{ associated with 
Kraus operators Ai appear with equal probability pi = 1/\U\. We observe that by 
Schumacher's representation [7j the entropy exchange of tt under a uniform U is 
simply given by 

S e (n,U) = S(l m /\U\) = log 2 \U\. 
Since IA is unital we also have 

5(W(tt)) = S(ir') = log 2 \Q'\ . 

Comparing these expressions with relation (|24p and recalling the definition of coherent 
information (cf. Sec. I2.2[) establishes the lower bound 

Q(U) > I(tt,U) ■ 

In fact, the following section we will show this bound to hold for general channels. 

4 For arbitrary operation elements B\, ... , Bn of TV, N = \Af\, let an N x N matrix H be defined 
by Hij := trB^B, . Since H = H\ there is a unitary matrix U such that UHU^ is diagonal. Because 
of the unitary freedom in the operator-sum representation [T], the operators A m := y\ U \ m Bj 
equivalently represent Af. It is readily verified that trAi* A m = for I ^ m. 
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7 General channels 



Starting again with relation (|13|) we will proof for a general channel M : B(Q) — ► 

QCAO > , (25) 

where 7r = 1q/|Q|, and, as a corollary, 

QCAO > I{irv,M) , (26) 

where 7iy is the normalized projection -ny = nV/|V| on any subspace V C Q. 

The strategy of proving is to approximate J\f® n by an almost uniform and unital 
channel N £ n , with which we then proceed as in the preceding section. We construct 

n in two steps. The first step is to reduce J\f® n to its typical Kraus operators, 
as will be defined below. This yields an almost uniform operation Af e n . In a second 
step, we let Af e>n follow a projection on the typical subspace of M(tt) in Q' n , resulting 
in an operation A4,n with the desired properties. 

We begin with briefly recalling definitions and basic properties of both typical 
sequences [29 1 and typical subspaces [12, lj. 

7.1 Typical sequences 

Let Xi, X2, X3, ... be independent random variables with an identical probability 
distribution V over an alphabet Let H(V) = — J2 a & ^i ) 1°S2 ^ > ( a ) denote the 
Shannon entropy of V, let n be a positive integer, and let e be some positive number. 
A sequence a = (01, 02, • • • , a n ) G H n is defined to be s- typical if its probability of 
appearance p a = V{a{)V{a2) ■ ■ ■ V{a n ) satisfies 

2 -n(H(V)+e) < pa < 2 -™(^0 P )- £ ) . 

Let H £ir i denote the set of all e-typical sequences of length n. 
Below we will make use of the following two well-known facts: 

(i) The number |H e>n | of all e-typical sequences of length n is less than 2 n ^ H ^ +£ ' > . 

(ii) The probability P £j „ = X) a gN e n Pa of a random sequence of length n being e- 

typical exceeds 1 — 2e~ n ^( e \ where ip(e) is a positive number independent of 
n. 

Proofs can be found in Appendix [Cl 

7.2 Typical subspaces 

Let p be some density operator of a quantum system Q, let n be a positive integer, 
and let e be a positive number. An eigenvector |v) of p® n is called e- typical if its 
eigenvalue p v satisfies 

2-n(S(p)+e) < „ < 2~ n ( s (' 5 )~ £ ) . 
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The e-typical subspace T £) „ of p in Q® n is defined as the span of all e-typical eigen- 
vectors of p® n . We denote the projection on T £jn by Ii £ ,n- 

Notice that typical eigenvectors correspond to typical sequences when an or- 
thonormal eigen-system \v\), . . . , |i>iqi) of p is chosen as alphabet a sequence of 
length n over N is identified with an eigenvector |v) = \vj 1 )\vj 2 ) . . . \vj n ) of p® n , and 
the probability ^((v)) of an eigenvector |v) of p is taken to be its eigenvalue. Then, 
the above stated properties of typical sequences translate to 

(i') The dimension of T £>n is less than 2 n ( s ( p ) +£ ) . 

(ii') The probability P £ ^ n = tr IL-^p®" of measuring an e-typical eigenvalue of p® n 
exceeds 1 — 2e~ n ^ e \ where is a positive number independent of n. 



7.3 Reduction of M® n 

Let a trace-preserving channel M : B(Q) — ► B(Q') be given. J\f may be represented 
in a minimal operator sum with Kraus operators A±, . . . ,Anj-\i which without loss 
of generality we assume to be diagonal, i.e. trAj'Ai = for i ^ j (cf. footnoted]). 
Accordingly, A/"® n can be represented by |A^| n Kraus operators Aj x ® Aj 2 (8) ... 8) Aj n 
where j v = 1, . . . , \J\f\. 

Now, letting an alphabet H be defined as the set of Kraus operators A±, . . . , A^ 
of M, the Kraus operators of N® n can obviously be regarded as sequences over H of 
length n. In order to identify an e-typical sequence of length n, and with it also an 
e-typical Kraus operator of N® n , we define a probability distribution V over ft by 

V(A) = r^jtr A* A , AeN. 

The normalization of V follows from the completeness relation X^AeH A* A = 1q, and, 
owing to the diagonality of the Kraus operators, the Shannon entropy H(V) turns out 
to agree with the entropy exchange S e (ir,ftf): Again by Schumacher's representation 

(W\ , \ (W\ \ 

s e ^M) = s\Y^^{AUm(i\\ =s\yyp{A l )\ 1 ){i\\ =h(v). 

Being in the possession of the probability distribution V over the set of Kraus 
operators N, we can define the e-typical channel K.n of AA® n to consist precisely of 
the operators A that are e-typical with respect to V, 

P >-» KM ■= J2 a p a] ■ 
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As a direct consequence of properties (i) and (ii) of typical sequences one finds (cf. Ap- 
pendix [D]) 

|A4, n | < 2 ni - s ^' N ^ , 
trA4,n(^n) > l-2e n ^\ 

where 7r n = Iqu /\Q\ n , and ip\ (e) is a positive number independent of n. Furthermore, 
the relative weight j^-trA^A of an e-typical operator A = Aj 1 ® . . . (g> Aj n is just 
the probability pa = V(Aj 1 ) . . . V(Aj n ) and therefore obeys 

2-n(S e (7T,AT)+e) < p A < 2 - n ( 5 'e(7r,A0-£) _ 

Hence, keeping only the e-typical Kraus operators the original channel J\f® n re- 
duces to a channel f\f sn with Kraus operators A G K £)W of similar probability pa- 
In general, this strongly reduced the number of Kraus operators from \M\ n to |A4,n| 
and renders Af En much closer to a uniform channel than the original channel A^® n . 
At the same time, the transmission probability of the homogeneously mixed state 7r n 
deviates only by an exponentially small amount from unity. 

In order to achieve also approximate unitality, we will further modify the channel 
by letting M e ,n follow a projection T e<n : p U £ n p n en on the e-typical subspace 
T £ , n C Q n of the density M(tt). This defines the e-reduced operation of N® n by 

A/ e , n := %,n°^e,n > 
with the following properties shown in Appendix |Pl 

\Me,n\ < 2 n ^^ +£ ^ , (27) 
trA4, n (7r n ) > l-4e-^( £ ), (28) 
||A4, n (7r„)||| < 2 - n ^^-^ , (29) 

where ip3(s) is a positive number independent of n. Now we are ready to proof relation 
(EH]): 

7.4 Q{N)>I{-k,N) 

We note that for any code C C Q® n 

F e (C,M m ) > F e (C,K,n) > F e {C,N £ , n ) . (30) 

The first inequality holds because M e , n is a reduction of J\[® n (cf. Sec. 11.21 relation 
@), and the second one follows from 

maxF(TT C ,TZoM £t n) > max F(ttc, Tl ° %, n A/" e , n ) = maxF(7rc,7^ o A4, n ) ■ 
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Averaging relation (|30p over the unitary ensemble of codes C C Q n of dimension 

we immediately obtain with relation ([T3]) and the bounds ([2"7]) . ([28]) . ([29]) 

[Fe(C,AT® n )] Xn > trA4, n (7r n ) - yjK n \K, n \ \\K 

> i _ 4e-»V*(e) _ 2 i(«+s e (^A0-.s(A r M)+4 e ) _ 

For all e > 0, the right-hand side of inequality converges to unity in the limit n — > oo 
if the asymptotic rate R obeys 

R + ie < S{N(tt))- S e {ir,N) = I(ir,N) . 

That is, all rates R = lmin^oo •= log 2 K n below I(ir,Af) are achievable and therefore 
I(tt,M) is a lower bound of the capacity Q(N). 
Relation ([26]) follows corollary: 

7.5 Q{M) > I(7v v ,Af) 

Let V be any subspace of the input Hilbert space Q of a channel M : B(Q) — > B(Q'), 
and let 7ry = IIy/|V| be the normalized projection on V. The restriction of M to 
densities with support in V ', 

is a channel for which the result of the previous subsection obviously predicts I(iry,J\fy) 
an achievable rate. It is evident that then I(irv,M) = /(vry, A/y) is a ls° an achievable 
rate of the complete channel N . Thus, for any subspace V C Q 

Q(N)>I(tt v ,N). 

8 Q{M) > I r {M) 

Finally, we will show that with the BSST lemma the result of the last subsection 
implies the lower bound 

m 

where m is an arbitrary large integer, and p any density on Q m . Clearly, this suffices 
to prove the regularized coherent information I r (M) (cf. Sec. 12.21 ) a lower bound of 
Q{M). 

The BSST lemma [25] states that for a channel M and an arbitrary state p on 
the input space of M 

lim lim -S{M m {TV £ , n )) = S(N(p)) , 
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(f) 

where 7r £)n is the normalized projection on the frequency-typical subspace T £ ; n of p. 
As a corollary, one obtains an analogous relation for the coherent information, 

Hm lim -I{^ n ,N® n ) = I{p,N). 
e—>0n^oon 

Te,n is similar to the ordinary typical subspace T £i „ which we have used above. The 

( f) 

difference is that for T £ ; n typicality of a sequence is defined via the relative frequency 
of symbols in this sequence, whereas for T £jTl it is defined by its total probability. For 
details we refer the reader to the work of Holevo [30] , where an elegant proof of the 
BSST lemma is given. 

Here, what matters is solely the fact that 7r £in is a homogeneously distributed 
subspace density of the kind that we used in the previous subsection. Thus we can 
make use of the bound Q{£ ) > I(jry,£) with, for instance, £ = J\[® mn ^ and V being 
the frequency- typical subspace T £)T / C Q mn of an arbitrary density p on Q rn . This 
means that for any e > and any m, n 

Q{M® mn ) > I{n £ ^N® mn ) ■ 
Using the trivial identity Q{N® k ) = kQ(M) we can therefore write 

Q(N) = — lim -Q(M® mn ) 

m rwoo n 

> — lim lim -I(vr £ n , (N® m )® n ) 
m 

where the last equation follows from the corollary. 



I would like to thank Michal Horodecki and Milosz Michalski for inviting me to 
contribute to the present issue of OSID on the quantum coding theorem. 

A Unitary invariant Hermitian form 

Let H be a finite dimensional Hilbert space with an orthonormal basis |1), . . . , \N), 
and let b : B{H) x B{H) — * C be a unitary invariant Hermitian form. For i,j G 
{1, ...,N} let Eij := \i){j\. As a consequence of the unitary invariance one finds 
constants a, f3 and 7 such that for i, j, € {1, . . . , iV}, i ^ j 

b(Eij,Eij) = a, 
b{Ea,Ejj) = p, 
b(Eu,E u ) = 7, 
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and for all other combinations of indices i,j, l,m £ {1, . . . , N} 

b(Eij,Ei m ) = . 

This immediately leads to 

b(V, W) = (7 - a - 0) h(V, W) + tiVhvW + a tr V^W , 

with 

N 

bi(V,W) = ^2{i\V^\i){i\W\i) . 

i=i 

Obviously, b\ is not unitary invariant, from which we conclude 7 — a — (3 = and 
thus 

b(V, W)=f3 trV^trW + a tr V^W , 
which is what we wanted to prove. 

B Average of | (-0 |7Tc7 1 2 

We show that independent of the normalized vector £ Q 
(notations as in Sec. [5]). By definition, 



■kc\tP)\ 2 ]k 



1 



K 2 



where the integral extends over U(Q) and Ilo is the projection on an arbitrarily chosen 
linear subspace C$ C Q of dimension K. We extend \ip) = \ipi) to an orthonormal 
basis l^i), • • • , \iPm) of Q-> an d chose 

C := span{|V>i), . . . , \4>k)} ■ 

Then 

J dn(U) mUH^m 2 = E / dn(U) lUuflUyl 2 , 
where = Making use of the unitary invariance of ji, this becomes 

K J dfi(U) \U U \ 4 + (K 2 -K) J dfx(U) |Z7u| 2 |£/i 2 | 2 . 
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For the calculation of these integrals we refer to the work of Pereyra and Mello |31] , 
in which, amongst others, the joint probability density for the elements Un, . . . , U\k 
of a random unitary matrix U S Uk has been determined to be 

/ k sn-k-l k 

p(Uu, . . . , U lk ) = C 1 - \Ula\ 2 ©(I " E l^al 2 ) > 

\ a=l / a=l 

where c is a normalization constant, and G(x) denotes the standard unit step function. 
By a straightforward calculation, we obtain from this 

2 



dfi(U) \U U \ 



J dn(U) |C/n| 2 |[/i 2 | 



M 2 + M ' 
1 



M 2 + M ' 
which immediately leads to Eq. ([3T]). 



C Typical Sequences 

The first property follows from 

1 = ^ p a > E ^ > \Kn\2- niH(V)+£) ■ 

a e W l a G H £in 

To prove the second property we first realize that by definition 

P e n = p r ( "a G is e-typical" ) = Pr(|- log 2 (p a ) - niI(P)| < ne) 

n 

= Pr( I J2 (" log 2 P( 0i ) - H{V)) I < ne ) . 

The negative logarithms of the probabilities V{a{) can be understood as n independent 
random variables Yj that assume values — log 2 T > (a) for all a G N with probabilities 
V(a). Their mean is the Shannon entropy H(V), 

H = E{Y X ) = - E Via) log 2 Vip) = H(V) . 

This means that 

n 

1-P £ , n = Pr(|E(^-/^)l > ne) 

1=1 

is the probability of a large deviation oc n. Since the variance a and all higher 
moments of Y± — fj, are finite we can employ a result from the theory of large deviations 
[32] . according to which 

n 

Pr(|£(y,- M )| > ne) < 2e~ n ^ £ \ 
1=1 

where tp(e) is a positive number that is approximately e 2 /2a 2 . 
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D Properties of N e , n an d N £ ,n 

We will show the following relations (definitions and notations as in Sec. IT.3[) : 

We,n\ < 2 n ( s °^M+£) (32) 

tvMeM > l-2e~ n ^ (33) 

We,n\ < 2 n (^KA0+ £ ) (34) 

trA4,„(vr n ) > \-Ae- n '^ (35) 

\\Ms,nMf F < 2-^W)- 3£ ), (36) 

where ^i(e) and ^(e) are positive numbers independent of n. 

The first relation follows from |A/" £ , n | = |N £ , n | < 2 n W p ) +£ ) and H(V) = S e (ir,Af). 
To prove relation ()33[) we note that for a Kraus operator A = Aj 1 <g> . . . <g> Aj n 

W tT AtA = W\ tT Aj ' Ajl "'W\ tT Aj " Ajn = v{An) ' ' ' v{Ajn) " PA ' 

Making use of property (ii) of typical sequences this shows 

trA /- Ejn(7rn ) = J_ J2 trAtA= ^ PA>l-2e n *^, 

where ?/>i(e) is a positive number independent of n. Relation (|34h is evident by 
relation (|32p and 

A4, n (/9) = n £in A/" £ ,n(p)n E>n = (n £ , n A)p(n £in A)t . 

AeK £ ,„ 



In order to show (|35p it is convenient to introduce the complementary operation 

M £ ,n Of N £ ,n by 

M 9n =M,n + M e , n , 

i.e. ,M £)n consists of the e- "untypical" Kraus operators of AT®", 

M £>n (p)= Yl A P Ai - 

AGN\H e ,„ 

Then, 

trA4, n (7r n ) = trn £ , n (A/"® n (7rO-^ £ , n (7r n )) 

> trn £)n AT® n (7r„)-trA4 £)n (7r n ). (37) 

The inequality results from the fact that for two positive operators A, B always 
tr^4i? > 0, and therefore (indices suppressed) 

trM(p) = trHM(p) +tr (1 - U)M(p) > tr UM(p) . 
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Taking into account that n £n projects on the typical subspace T e ^ n ofJ\f(ir) and using 
property (ii') of typical subspaces, the first term in Eq. (f3T|) can be bounded from 
below as 

tr n £ , n A^> n ) = tr n ei „A^>® n ) = tr II e) „(A%)f " > 1 - 2e~ n ^ . 

The second term in Eq. (|37p obeys 

trM e , n (ir n ) = trN® n (Tr n ) - trA/^M < 2e~ n ^ £ \ 

by relation (f33|) . We thus find 

trA4,„(7r„) > 1 - 2{e- n ^ 2{£) + e"™^) > l-Ae'^ 3 ^, 

when -03 (e) := min{^i(e), ^(e)}- 

Finally, we address the Frobenius norm of J\f(ir n ). For positive operators A, £> 

||A + B|||. = ||A|||. + [|fl[|2.+2trAB > || A ||| + || 5 ||| . 

This can be used to derive 

II %,n ° N® n {lT n ) ||| = || T £i „ o (A4, n + A4 £ ,n)(7Tn) III > II %,n ° A^n^n) III • 

Thus 

|| J\fe,n{K n ) ||| = ||^,n ° A/e,n(7Tn) ||| 

< ||T e , n oA^>n)||| 

= ||n £jn (Ar(7r))^n £in ||| 

E (^) 2 

|v) e-typical eigenvector 

< 2 -n(5(AT(7r))-3 £ ) ^ 

where we used dimT £j „ < 2 n< - s( - Af ^ +£ ^ (property (i')) and p v < 2~ n( - s( - Af ^- £ ^ to 
derive the last inequality. 
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